Goto

Collaborating Authors

 macro average 0


EVolutionary Independent DEtermiNistiC Explanation

Dentamaro, Vincenzo, Giglio, Paolo, Impedovo, Donato, Pirlo, Giuseppe

arXiv.org Artificial Intelligence

Current explainability methods often produce inconsistent results and struggle to highlight essential signals influencing model inferences. This paper introduces the Evolutionary Independent Deterministic Explanation (EVIDENCE) theory, a novel approach offering a deterministic, model-independent method for extracting significant signals from black-box models. EVIDENCE theory, grounded in robust mathematical formalization, is validated through empirical tests on diverse datasets, including COVID-19 audio diagnostics, Parkinson's disease voice recordings, and the George Tzanetakis music classification dataset (GTZAN). Practical applications of EVIDENCE include improving diagnostic accuracy in healthcare and enhancing audio signal analysis. For instance, in the COVID-19 use case, EVIDENCE-filtered spectrograms fed into a frozen Residual Network with 50 layers (ResNet50) improved precision by 32% for positive cases and increased the Area Under the Curve (AUC) by 16% compared to baseline models. For Parkinson's disease classification, EVIDENCE achieved near-perfect precision and sensitivity, with a macro average F1-Score of 0.997. In the GTZAN, EVIDENCE maintained a high AUC of 0.996, demonstrating its efficacy in filtering relevant features for accurate genre classification. EVIDENCE outperformed other Explainable Artificial Intelligence (XAI) methods such as Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Gradient-weighted Class-Activation Mapping (GradCAM) in almost all metrics. These findings indicate that EVIDENCE not only improves classification accuracy but also provides a transparent and reproducible explanation mechanism, crucial for advancing the trustworthiness and applicability of AI systems in real-world settings.


Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

Boukhers, Zeyd, Yang, Cong

arXiv.org Artificial Intelligence

The availability of metadata for scientific documents is pivotal in propelling scientific knowledge forward and for adhering to the FAIR principles (i.e. Findability, Accessibility, Interoperability, and Reusability) of research findings. However, the lack of sufficient metadata in published documents, particularly those from smaller and mid-sized publishers, hinders their accessibility. This issue is widespread in some disciplines, such as the German Social Sciences, where publications often employ diverse templates. To address this challenge, our study evaluates various feature learning and prediction methods, including natural language processing (NLP), computer vision (CV), and multimodal approaches, for extracting metadata from documents with high template variance. We aim to improve the accessibility of scientific documents and facilitate their wider use. To support our comparison of these methods, we provide comprehensive experimental results, analyzing their accuracy and efficiency in extracting metadata. Additionally, we provide valuable insights into the strengths and weaknesses of various feature learning and prediction methods, which can guide future research in this field.


Predicting Coronary Heart Disease Using a Suite of Machine Learning Models

Al-Karaki, Jamal, Ilono, Philip, Baweja, Sanchit, Naghiyev, Jalal, Yadav, Raja Singh, Khan, Muhammad Al-Zafar

arXiv.org Artificial Intelligence

Coronary Heart Disease affects millions of people worldwide and is a well-studied area of healthcare. There are many viable and accurate methods for the diagnosis and prediction of heart disease, but they have limiting points such as invasiveness, late detection, or cost. Supervised learning via machine learning algorithms presents a low-cost (computationally speaking), non-invasive solution that can be a precursor for early diagnosis. In this study, we applied several well-known methods and benchmarked their performance against each other. It was found that Random Forest with oversampling of the predictor variable produced the highest accuracy of 84%.


Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records

Chuang, Yao-Shun, Jiang, Xiaoqian, Lee, Chun-Teh, Brandon, Ryan, Tran, Duong, Tokede, Oluwabunmi, Walji, Muhammad F.

arXiv.org Artificial Intelligence

The extent is indicated by the percentage of teeth affected by periodontitis at the identified stage. Grading depends on the risk of disease progression associated with the history of disease progression, local and systemic factors. Despite the introduction of new diagnostic terms for periodontal diseases, dental care providers might not be acquainted with them due to the complexity of this new system. This results in clinical documentation lacking accurate and structured diagnosis, or in some cases, no diagnosis being recorded. Inadequate periodontal diagnoses poses a significant threat to patient care quality. An accurate diagnosis is key to the provision of appropriate patient care, outcome assessment and quality improvement efforts. This, in turn, may hinder future care providers from evaluating the patient's condition precisely and providing optimal treatment. Electronic dental records (EDR) have become widely adopted in dental care, providing an opportunity to address the issue of missing diagnoses. EDRs include comprehensive information on a patient's history, clinical examination, diagnosis, treatment, and prognosis


SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions for Collocations in Spanish

Kostiuk, Yevhen, Sidorov, Grigori, Kolesnikova, Olga

arXiv.org Artificial Intelligence

In natural language processing (NLP), lexical function is a concept to unambiguously represent semantic and syntactic features of words and phrases in text first crafted in the Meaning-Text Theory. Hierarchical classification of lexical functions involves organizing these features into a tree-like hierarchy of categories or labels. This is a challenging task as it requires a good understanding of the context and the relationships among words and phrases in text. It also needs large amounts of labeled data to train language models effectively. In this paper, we present a dataset of most frequent Spanish verb-noun collocations and sentences where they occur, each collocation is assigned to one of 37 lexical functions defined as classes for a hierarchical classification task. Each class represents a relation between the noun and the verb in a collocation involving their semantic and syntactic features. We combine the classes in a tree-based structure, and introduce classification objectives for each level of the structure. The dataset was created by dependency tree parsing and matching of the phrases in Spanish news. We provide baselines and data splits for each objective.


Overlapping Word Removal is All You Need: Revisiting Data Imbalance in Hope Speech Detection

LekshmiAmmal, Hariharan RamakrishnaIyer, Ravikiran, Manikandan, Nisha, Gayathri, Balamuralidhar, Navyasree, Madhusoodanan, Adithya, Madasamy, Anand Kumar, Chakravarthi, Bharathi Raja

arXiv.org Artificial Intelligence

Hope Speech Detection, a task of recognizing positive expressions, has made significant strides recently. However, much of the current works focus on model development without considering the issue of inherent imbalance in the data. Our work revisits this issue in hope-speech detection by introducing focal loss, data augmentation, and pre-processing strategies. Accordingly, we find that introducing focal loss as part of Multilingual-BERT's (M-BERT) training process mitigates the effect of class imbalance and improves overall F1-Macro by 0.11. At the same time, contextual and back-translation-based word augmentation with M-BERT improves results by 0.10 over baseline despite imbalance. Finally, we show that overlapping word removal based on pre-processing, though simple, improves F1-Macro by 0.28. In due process, we present detailed studies depicting various behaviors of each of these strategies and summarize key findings from our empirical results for those interested in getting the most out of M-BERT for hope speech detection under real-world conditions of data imbalance.